多模式的医学图像完成已广泛应用,以减轻许多多模式诊断任务中缺失的模式问题。但是,对于大多数现有的合成方法,它们缺失模式的推断可能会崩溃为确定性映射,从而忽略了跨模式关系中固有的不确定性。在这里,我们提出了统一的多模式条件分数的生成模型(UMM-CSGM),以利用基于得分的生成模型(SGM)在建模和随机采样目标概率分布中,并进一步将SGM扩展到交叉模式统一框架中各种缺失模式配置的条件合成。具体而言,UMM-CSGM采用一种新型的多中心条件分数网络(MM-CSN),通过在完整的模态空间中的条件扩散和反向产生来学习一组综合的跨模式条件分布。通过这种方式,可以通过所有可用信息来准确地制定生成过程,并可以符合单个网络中缺少模式的所有可能配置。 BRATS19数据集的实验表明,UMM-CSGM可以更可靠地合成肿瘤诱导的任何缺失方式的肿瘤诱导病变中的异质增强和不规则面积。
translated by 谷歌翻译
通过使用图像级分类掩模监督其学习过程,弱监督对象本地化(WSOL)放宽对对象本地化的密度注释的要求。然而,当前的WSOL方法遭受背景位置的过度激活,并且需要后处理以获得定位掩模。本文将这些问题归因于背景提示的不明显,并提出了背景感知分类激活映射(B-CAM),以便仅使用图像级标签同时学习对象和背景的本地化分数。在我们的B-CAM中,两个图像级功能,由潜在背景和对象位置的像素级别功能聚合,用于从对象相关的背景中净化对象功能,并表示纯背景样本的功能,分别。然后基于这两个特征,学习对象分类器和背景分类器,以确定二进制对象本地化掩码。我们的B-CAM可以基于提出的错开分类损失以端到端的方式培训,这不仅可以改善对象本地化,而且还抑制了背景激活。实验表明,我们的B-CAM在Cub-200,OpenImages和VOC2012数据集上优于一级WSOL方法。
translated by 谷歌翻译
目的:深度神经网络(DNN)已被广泛应用于医学图像分类中,从其在医学图像中的强大映射能力中受益。但是,这些现有的基于深度学习的方法取决于大量精心标记的图像。同时,标记过程中不可避免地引入噪声,从而降低了模型的性能。因此,制定强大的培训策略以减轻医学图像分类任务中的标签噪声是很重要的。方法:在这项工作中,我们提出了一种新颖的贝叶斯统计数据指导标签翻新机制(BLRM),以防止过度适合嘈杂的图像。 BLRM利用贝叶斯统计数据和指定时间加权技术中的最大后验概率(MAP)来选择性地纠正嘈杂图像的标签。激活BLRM时,训练时期逐渐纯化训练图像,从而进一步改善分类性能。结果:关于合成噪声图像(公共OCT和Messidor数据集)和现实世界嘈杂图像(Animal-10N)的全面实验表明,BLRM选择性地翻新了噪声标签,从而凝结了噪声数据的不良影响。同样,与DNN集成的抗噪声BLRM在不同的噪声比下有效,并且独立于骨干DNN架构。此外,BLRM优于抗噪声的最新比较方法。结论:这些研究表明,所提出的BLRM能够缓解医学图像分类任务中的标签噪声。
translated by 谷歌翻译
Despite significant progress in object categorization, in recent years, a number of important challenges remain; mainly, the ability to learn from limited labeled data and to recognize object classes within large, potentially open, set of labels. Zero-shot learning is one way of addressing these challenges, but it has only been shown to work with limited sized class vocabularies and typically requires separation between supervised and unsupervised classes, allowing former to inform the latter but not vice versa. We propose the notion of vocabulary-informed learning to alleviate the above mentioned challenges and address problems of supervised, zero-shot, generalized zero-shot and open set recognition using a unified framework. Specifically, we propose a weighted maximum margin framework for semantic manifold-based recognition that incorporates distance constraints from (both supervised and unsupervised) vocabulary atoms. Distance constraints ensure that labeled samples are projected closer to their correct prototypes, in the embedding space, than to others. We illustrate that resulting model shows improvements in supervised, zero-shot, generalized zero-shot, and large open set recognition, with up to 310K class vocabulary on Animal with Attributes and ImageNet datasets.
translated by 谷歌翻译
Deploying reliable deep learning techniques in interdisciplinary applications needs learned models to output accurate and ({even more importantly}) explainable predictions. Existing approaches typically explicate network outputs in a post-hoc fashion, under an implicit assumption that faithful explanations come from accurate predictions/classifications. We have an opposite claim that explanations boost (or even determine) classification. That is, end-to-end learning of explanation factors to augment discriminative representation extraction could be a more intuitive strategy to inversely assure fine-grained explainability, e.g., in those neuroimaging and neuroscience studies with high-dimensional data containing noisy, redundant, and task-irrelevant information. In this paper, we propose such an explainable geometric deep network dubbed as NeuroExplainer, with applications to uncover altered infant cortical development patterns associated with preterm birth. Given fundamental cortical attributes as network input, our NeuroExplainer adopts a hierarchical attention-decoding framework to learn fine-grained attentions and respective discriminative representations to accurately recognize preterm infants from term-born infants at term-equivalent age. NeuroExplainer learns the hierarchical attention-decoding modules under subject-level weak supervision coupled with targeted regularizers deduced from domain knowledge regarding brain development. These prior-guided constraints implicitly maximizes the explainability metrics (i.e., fidelity, sparsity, and stability) in network training, driving the learned network to output detailed explanations and accurate classifications. Experimental results on the public dHCP benchmark suggest that NeuroExplainer led to quantitatively reliable explanation results that are qualitatively consistent with representative neuroimaging studies.
translated by 谷歌翻译
Medical image segmentation (MIS) is essential for supporting disease diagnosis and treatment effect assessment. Despite considerable advances in artificial intelligence (AI) for MIS, clinicians remain skeptical of its utility, maintaining low confidence in such black box systems, with this problem being exacerbated by low generalization for out-of-distribution (OOD) data. To move towards effective clinical utilization, we propose a foundation model named EvidenceCap, which makes the box transparent in a quantifiable way by uncertainty estimation. EvidenceCap not only makes AI visible in regions of uncertainty and OOD data, but also enhances the reliability, robustness, and computational efficiency of MIS. Uncertainty is modeled explicitly through subjective logic theory to gather strong evidence from features. We show the effectiveness of EvidenceCap in three segmentation datasets and apply it to the clinic. Our work sheds light on clinical safe applications and explainable AI, and can contribute towards trustworthiness in the medical domain.
translated by 谷歌翻译
While inferring common actor states (such as position or velocity) is an important and well-explored task of the perception system aboard a self-driving vehicle (SDV), it may not always provide sufficient information to the SDV. This is especially true in the case of active emergency vehicles (EVs), where light-based signals also need to be captured to provide a full context. We consider this problem and propose a sequential methodology for the detection of active EVs, using an off-the-shelf CNN model operating at a frame level and a downstream smoother that accounts for the temporal aspect of flashing EV lights. We also explore model improvements through data augmentation and training with additional hard samples.
translated by 谷歌翻译
Seismic data often undergoes severe noise due to environmental factors, which seriously affects subsequent applications. Traditional hand-crafted denoisers such as filters and regularizations utilize interpretable domain knowledge to design generalizable denoising techniques, while their representation capacities may be inferior to deep learning denoisers, which can learn complex and representative denoising mappings from abundant training pairs. However, due to the scarcity of high-quality training pairs, deep learning denoisers may sustain some generalization issues over various scenarios. In this work, we propose a self-supervised method that combines the capacities of deep denoiser and the generalization abilities of hand-crafted regularization for seismic data random noise attenuation. Specifically, we leverage the Self2Self (S2S) learning framework with a trace-wise masking strategy for seismic data denoising by solely using the observed noisy data. Parallelly, we suggest the weighted total variation (WTV) to further capture the horizontal local smooth structure of seismic data. Our method, dubbed as S2S-WTV, enjoys both high representation abilities brought from the self-supervised deep network and good generalization abilities of the hand-crafted WTV regularizer and the self-supervised nature. Therefore, our method can more effectively and stably remove the random noise and preserve the details and edges of the clean signal. To tackle the S2S-WTV optimization model, we introduce an alternating direction multiplier method (ADMM)-based algorithm. Extensive experiments on synthetic and field noisy seismic data demonstrate the effectiveness of our method as compared with state-of-the-art traditional and deep learning-based seismic data denoising methods.
translated by 谷歌翻译
With the development of natural language processing techniques(NLP), automatic diagnosis of eye diseases using ophthalmology electronic medical records (OEMR) has become possible. It aims to evaluate the condition of both eyes of a patient respectively, and we formulate it as a particular multi-label classification task in this paper. Although there are a few related studies in other diseases, automatic diagnosis of eye diseases exhibits unique characteristics. First, descriptions of both eyes are mixed up in OEMR documents, with both free text and templated asymptomatic descriptions, resulting in sparsity and clutter of information. Second, OEMR documents contain multiple parts of descriptions and have long document lengths. Third, it is critical to provide explainability to the disease diagnosis model. To overcome those challenges, we present an effective automatic eye disease diagnosis framework, NEEDED. In this framework, a preprocessing module is integrated to improve the density and quality of information. Then, we design a hierarchical transformer structure for learning the contextualized representations of each sentence in the OEMR document. For the diagnosis part, we propose an attention-based predictor that enables traceable diagnosis by obtaining disease-specific information. Experiments on the real dataset and comparison with several baseline models show the advantage and explainability of our framework.
translated by 谷歌翻译
Feature transformation for AI is an essential task to boost the effectiveness and interpretability of machine learning (ML). Feature transformation aims to transform original data to identify an optimal feature space that enhances the performances of a downstream ML model. Existing studies either combines preprocessing, feature selection, and generation skills to empirically transform data, or automate feature transformation by machine intelligence, such as reinforcement learning. However, existing studies suffer from: 1) high-dimensional non-discriminative feature space; 2) inability to represent complex situational states; 3) inefficiency in integrating local and global feature information. To fill the research gap, we formulate the feature transformation task as an iterative, nested process of feature generation and selection, where feature generation is to generate and add new features based on original features, and feature selection is to remove redundant features to control the size of feature space. Finally, we present extensive experiments and case studies to illustrate 24.7\% improvements in F1 scores compared with SOTAs and robustness in high-dimensional data.
translated by 谷歌翻译